Convex Optimization 2015.11.25

\(\lVert y \rVert _* = \sup \{ x^T y : \lVert x \rVert \le 1 \}\)

\(\implies x^T y \le \lVert x \rVert \cdot \lVert y \rVert _*\)

(because \(\frac{x^T}{\lVert x \rVert} y \le \lVert y \rVert _*\))

Want inequality of type: \(x^T y \le f(x) + "f^*(y)"\) for "general" \(f\) (Fenchel's Inequality)

  • Definition: For \(f : \mathbb{R}^n \to \mathbb{R}\), the conjugate \(f^*\) of \(f\) is defined by \(f^*(y) = \underset{x}{\sup} (x^T y - f(x))\)

with $dom , f^* = $ set of \(y\)'s for which \(\sup\) is \(\lt \infty\).

  • Example:
  1. \(f(x) = a^T x + b (x \in \mathbb{R}^n)\)

\(f^*(y) = \underset{x}{\sup} x^T y - a^T x - b = \begin{cases} \infty & \text{if } y \neq a \\ -b & \text{if } y = a \end{cases}\)

  1. \(f(x) = -\log x (x \gt 0)\)

\((xy + \log x)' = y + \frac{1}{x} = 0 \implies x = -\frac{1}{y}\)

\(f^*(y) = \underset{x \gt 0}{\sup} x^T y + \log x = \begin{cases} \infty & \text{if } y \ge 0 \\ -\log(-y) - 1 & \text{if } y \lt 0 \end{cases}\)

  1. \(f(x) = e^x (x \in \mathbb{R})\)

\((xy - e^x)' = y - e^x = 0 \implies x = \log y\)

\(f^*(y) = \underset{x}{\sup} x^T y - e^x = \begin{cases} \infty & \text{if } y \lt 0 \\ y\log y - y & \text{if } y \ge 0 \end{cases}\)

  1. \(f(x) = x \log x (x \ge 0)\)

\((xy - x \log x)' = y - \log x - 1 = 0 \implies x = e^{y - 1}\)

\(f^*(y) = \underset{x \ge 0}{\sup} x^T y - x \log x = y e^{y - 1} - (y - 1) e^{y - 1} = e^{y - 1}\)

  1. \(f(x) = \frac{1}{2} x^T Qx\) with \(Q \in S_{++}^n\)

\(f^*(y) = \underset{x}{\sup} x^T y - \frac{1}{2} x^T Qx = y^T Q^{-1} y - \frac{1}{2} y^T Q^{-1} y = \frac{1}{2} y^T Q^{-1} y\)

(\(\underset{x}{\inf} x^T Ax + x^T b \implies \text{best} x = -\frac{1}{2} A^{-1} b\))

So \(x = Q^{-1} y\)

\(\implies x^T y \le \frac{1}{2} x^T Qx + \frac{1}{2} y^T Q^{-1} y\), for all \(Q \succ 0\)

  1. \(f(x) = \log \left( \sum _{i = 1}^n e^{x_i} \right)\)

\(f*(y) = \underset{x}{\sup} x^T y - \log \left( \sum _{i = 1}^n e^{x_i} \right)\)

\(\left(xy - \log \left( \sum _{i = 1}^n e^{x_i} \right) \right)' = y - \frac{e^{x_i}}{\sum _{i = 1}^n e^{x_i}} = 0\)

\(\implies y_i = \frac{e^{x_i}}{\sum _{i = 1}^n e^{x_i}}, y \succeq 0, 1^T y = 1\)

assume for simplicity, \(y \succ 0\)

put \(x_i = \log (y_i)\), then \(\sum e^{x_i} = 1^T y = 1\) and optimality conditions hold

then \(f^*(y) = \sum _{i = 1}^n y_i \log (y_i) - \log (1^T y) = \sum _{i = 1}^n y_i \log (y_i)\)

  1. \(f(x) = \lVert x \rVert\)

$f^*(y) = x^T y - x =

\[\begin{cases} 0 & \text{if } \lVert y \rVert _* \le 1 \\ \infty & \text{if } \lVert y \rVert _* \gt 1 \end{cases}\]

$

\(x^T y - \lVert x \rVert \le \lVert x \rVert \cdot \lVert y \rVert _* - \lVert x \rVert = \lVert x \rVert (\lVert y \rVert _* - 1) \le 0\) if \(\lVert y \rVert _* - 1 \le 0\)

  1. \(f(x) = \frac{1}{2} \lVert x \rVert ^2\)

\(f^*(y) = \underset{x}{\sup} x^T y - \frac{1}{2} \lVert x \rVert ^2 = \frac{1}{2} \lVert y \rVert _*^2\)

\(x^T y - \frac{1}{2} \lVert x \rVert ^2 \le \lVert x \rVert \cdot \lVert y \rVert _* - \frac{1}{2} \lVert x \rVert ^2 \le \frac{1}{2} \lVert y \rVert _*^2\) (\(\lVert x \rVert = \lVert y \rVert_*\))

\(\implies x^T y \le \frac{1}{2} \lVert x \rVert ^2 + \frac{1}{2} \lVert y \rVert _*^2\)

  • Proof of general hyperplane seperation:

Let \(C \subseteq \mathbb{R}^n\) be a convex set, \(H \subseteq \mathbb{R}\) be the affine subspace of smallest dimention containing \(C\), we write \(C_{\varepsilon} = \{ x : B_{\varepsilon} (x) \bigcap H \subseteq C \}\)

then \(C_{\varepsilon} \subseteq "\text{relint} (C)" = \underset{\varepsilon \gt 0}{\bigcup} C_{\varepsilon}\). (relint: relative interior)

(\(C \subseteq \overline{\text{relint}(C)}\), \(C\) is a subset of closure of \(\text{relint}(C)\))

Let \(C, D\) be disjoint convex sets. Then for every \(\varepsilon \gt 0\) the sets \(A_{\varepsilon} = \overline{C_{\varepsilon}} \bigcap B_{\frac{1}{\varepsilon}}(0)\), \(\overline{D}\) are closed disjoint convex sets with \(\overline{C_{\varepsilon}} \bigcap B_{\frac{1}{\varepsilon}}(0)\) bounded, and \(\text{dist}(A_{\varepsilon}, \overline{D}) \ge \varepsilon \gt 0\).

So \(\exists A_{\varepsilon} \in \mathbb{R}^n\), \(a_{\varepsilon} \neq 0\), \(b_{\varepsilon} \in \mathbb{R}\) s.t. \((a_{\varepsilon}, b_{\varepsilon})\) define a seperating hyperplane for \(A_{\varepsilon}, \overline{D}\).

\(a_{\varepsilon}^T x \le b_{\varepsilon} \; \forall x \in A_{\varepsilon}\), \(a_{\varepsilon}^T x \ge b_{\varepsilon} \; \forall x \in \overline{D}\)

WLOG \(\lVert a_{\varepsilon} \rVert = 1\)

The sequence \((\vec{a}_\frac{1}{n})_{n = 1}^{\infty}\) is a sequence of unit vectors and so has a convergent subsequence, say WLOG convergent to \(a_0 \in \mathbb{R}^n\).

can assume sequence \(b_{\frac{1}{n}}\) is bonded (or else one of the sets \(C, D\) is empty)

and so also convergent to some value \(b_0 \in \mathbb{R}\).

Want to show \((a_0, b_0)\) is SH for \(C, D\), i.e., that

\(a_0^T x \le b_0 \; \forall x \in C, \, a_0^T x \ge b_0 \; \forall x \in D\)

(Assume \(C\) is not a point, proof like above; then assume D is not a point, switch \(C, D\).

If \(C, D\) are points, obious true.)

Log-convexity and log-concavity

  • Definition: \(f : \mathbb{R}^n \to \mathbb{R}_{\gt 0}\) is log-convex (log-concave) if \(\log (f)\) is convex (concave).

  • Convexity:

\(\log (f(\theta x + (1 - \theta) y)) \le \theta \log (f(x)) + (1 - \theta) \log (f(y)) = \log (f(x)^{\theta} f(y)^{1 - \theta})\)

\(\iff f(\theta x + (1 - \theta)y) \le f(x)^{\theta} f(y)^{1 - \theta}\)

  • Remark 2: log-convex \(\implies\) convex, \(f(x) = e^{\log f(x)}\), (composition function, QED)

concave \(\implies\) log-concave